Greedy-layer pruning: Speeding up transformer models for natural language processing
نویسندگان
چکیده
Fine-tuning transformer models after unsupervised pre-training reaches a very high performance on many different natural language processing tasks. Unfortunately, transformers suffer from long inference times which greatly increases costs in production. One possible solution is to use knowledge distillation, solves this problem by transferring information large teacher smaller student models. Knowledge distillation maintains and compression rates, nevertheless, the size of model fixed can not be changed individually for given downstream task use-case reach desired performance/speedup ratio. Another reduce much more fine-grained computationally cheaper fashion prune layers pre-training. The price pay that layer-wise pruning algorithms par with state-of-the-art methods. In paper, Greedy-layer introduced (1) outperform current pruning, (2) close gap when compared while (3) providing method adapt dynamically tradeoff without need additional phases. Our source code available https://github.com/deepopinion/greedy-layer-pruning.
منابع مشابه
Speeding Up FastICA by Mixture Random Pruning
We study and derive a method to speed up kurtosis-based FastICA in presence of information redundancy, i.e., for large samples. It consists in randomly decimating the data set as more as possible while preserving the quality of the reconstructed signals. By performing an analysis of the kurtosis estimator, we find the maximum reduction rate which guarantees a narrow confidence interval of such ...
متن کاملConnectionist Models for Natural Language Processing Program
The scientific adequacy of models based on a small number of coarse-grained primitives (e.g. conceptual dependency), popular in AI during the 70's, has been called into question and substantially replaced by a current emphasis in much of computational linguistics on lexicalist models (i.e., ones which use words for representing concepts or meanings). However, few people can doubt that words are...
متن کاملTFLEX: Speeding Up Deep Parsing with Strategic Pruning
This paper presents a method for speeding up a deep parser through backbone extraction and pruning based on CFG ambiguity packing.1 The TRIPS grammar is a wide-coverage grammar for deep natural language understanding in dialogue, utilized in 6 different application domains, and with high coverage and sentence-level accuracy on human-human task-oriented dialogue corpora (Dzikovska, 2004). The TR...
متن کاملSpeeding up LFG Parsing Using C-Structure Pruning
In this paper we present a method for greatly reducing parse times in LFG parsing, while at the same time maintaining parse accuracy. We evaluate the methodology on data from English, German and Norwegian and show that the same patterns hold across languages. We achieve a speedup of 67% on the English data and 49% on the German data. On a small amount of data for Norwegian, we achieve a speedup...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Pattern Recognition Letters
سال: 2022
ISSN: ['1872-7344', '0167-8655']
DOI: https://doi.org/10.1016/j.patrec.2022.03.023